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ABSTRACT 



Problems in conducting an outcome evaluation under 



uncertain knowledge of effects are addressed. This evaluation was 
designed to obtain outcome assessments of a National Science 
Foundation Workshop, which was working with 20 high school chemistry 
teachers and 20 high school biology teachers to stimulate effective 
teaching, assist teachers in developing instructional modules, and 
provide a broader network of scientists to relate to high school 
science activities. The initial evaluation design called for 
self -reports of program impact from participants, but hard data, in 
terms of test scores, became necessary to document impact more 
directly. Three measures were selected to gather data on workshop 
impact: (1) a questionnaire determining the reactions of 
participants; (2) a questionnaire examining the utility for teachers 
of activities in the workshop as translated into their teaching; and 
(3) an instrument. Our Class and Its Work (OCIW) , developed by M. J. 
Eash and H. Waxman (1983), which was used to gather information from 
students of workshop teachers (N=442) and students of control 
teachers (N=384). Students' perceptions of classroom approach 
paralleled teacher self -reports; when teachers repor'^ed a change in 
classroom behavior, OCIW student data suppored a change. The problems 
in outcome evaluation illustrated by this study indicate the need to 
rely on more than just test scores in evaluating program impact. 
(SLD) 
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Determining Outcomes for Evaluation of 
A National Science Foundation Workshop 



A number of years ago in a speech before graduate students Paul Lazarsfeld, the Sociologist, 
observed that all evaluation studies had four components: What outcomes was the study interested 
in? How were these outcomes measured? What comparisons were made? What specification and 
generalization of the outcomes could be drawn? All evaluation in Lazarsfeld's paradigm of 
evaluation studies point toward given outcomes (effects) and I suspect that this is true for most 
evaluations today as it was over thirty years ago when I first eacountered his observation. The 
assumption behind such exclusively oriented outcome evaluation is that project director and 
evaluator can know in advance and identify with certainty, if not certitude, the effects of treatment 
as manifest in outcomes. The case that I shall examine in this paper is that of conducting an 
outcome evaluation under uncertain knowledge of effects. The evaluation was designed to obtain 
outcome assessments of an NSF workshop for high school science teachers, entitled "Applications 
of Basic Science in Industry and Society to Enhance Secondary School Science". 

This workshop-seminar was part of tiie Teacher Enhancement effort funded by the Direaorate 
in Science and Engineering under the National Science Foundation. It was designed to work with 
20 Chemistry and 20 Biology teaches in a three week summer workshop and a year long series of 
six seminars. The objectives as submitted with the project were: 

" 1 . To stimulate effective leaching approaches to theories and concepts in Biology and 
Chemistry for secondary science teachers who are teaching in several science 
disciplinary fields. 

" 2. To build curriculum units firom industrial and societal applications of conceptual and 
syntactical science using teaching strategies geared to the range of students abilities. 
3. While the major purpose of the project is teacher enhancement a secondary focus is to 
provide training for teacher-leaders who will be identified and encouraged to give 
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Icad^hip in their schools and to science education in the Commonwealth. 

4. To provide lesser prepared Biology and Chemistry teachers (especially those teaching 
in other than their major) an opportunity to build enriched curriculum units and 
develop more effective teaching strategies. 

5. To develop long-range uetworidng opportunities for secondary science teachers in 
the metropolitan area to work together and profit from a continuous exchange of new 
materials and teaching strategies. 

6. To provide impetus and incentive for development of long term collaborative relationships 
between secondary ccience teachers, college and university scientists, and industry and 
government affiliated scientists." 

These six objectives had three central foci: to stimulate more effective teaching, to assist 
teachers with dweloping instructional modules and to provide a broader network of scientists to 
relate to high school science activities. To achieve these objectives a series of activities were 
designed which included visits to 10 industries, agencies and natural sites; the development by 
teams of teachers of instructional modules; laboratory work and lectures; all of which culminated in 
units of instructions which were taught in the academic year following the summer workshop. In 
the six academic year seminars teachers' teams presented their efforts, described the 
implementation in the classroom, and received critiques on their woric The completed modules 
were written up, printed and distributed to all paiticipants. 

An evaluation design which had been sketched in the original project called for an external 
evaluator to gather data from participants on their reaction to the workshop, i.e., did it accomplish 
the six objectives, and to verify these responses through interviews with teachers and 
administrators in their schools. After the first summer workshop was finished, our program 
officer at NSF said the Foundation for Congressional support needed to receive data that more 
fiilly documented the "impact" of the woricshops on the participants. When queried on what was 
considered imp&ct data - he said, "test scores". This set off a scramble to redesign the evaluation to 



be more specific and responsive to NSFs understanding of outcomes. It was made quite clear that 
self report data, on participants* enthusiastic response to the workshops, was no longer acceptable 
as evaluation outcome data- Hard evidence, "test scores", was required. 

In resolving this evaluation design problem, we encountered a range of philosophical and 
technical problems which are posed by outconae evaluation. These are presented under four 
questions: 

. 1. Can the outcomes of the impact of the workshop be identifled? 

Given the mandate to produce "test scores", and specifically achievement test evidence that 
would demonstrate that the teachers had learned "more science", a quick search was made of 
standardized tests that were available. As the teachers were preparing their own modules fix>m 
their summer activities these deviated fiom the standard textbook curriculum. It took limited . 
detective work to determine that the achievement tests available for use with either students or 
teachers lacked curriculum validity. We also came to the conclusion that a logical linear model of 
evaluation. Objectives specified — > activities performed -> outcomes relating to objectives 
measiued -> conclusions drawn from the analysis of these data was too reductionistic of the 
dynamics involved in this experience - centered approach to teacher enhancement and curriculum 
improvement Moreover, the workshops objectives as drawn in the original proposal were broad 
and general - unprescriptive as to curriculum treatment Indeed, this audience scarcely needs to be 
reminded that ac*: icvement tests have severe limitations on measuring goals of curriculum that 
encompass more than direct knowledge of subject matter - interest, curiosity, problem solving 
skill, critical analysis - to name a few domains that are largely missing firom the psychometric tasks. 
We early-on detennin^id that we must cast our evaluation net much broader to detect the impact of 
the woricshop-seminar treatment 

As the result of this limitation of extant tests, it was decided to seek a different measures that 
were both more sensitive and representative of the intents of the workshop. In this decision three 



types of measures were selected to gather data on the workshop treatment; 

(1) A questionnaire prepared by the external evaluator on participants' reaction to ^he different 
con^wnents of the workshop and to document the implementation of an insr 'ction?i 
treatment 

(2) A questionnaire prepared by the internal evaluator and administered at the conclusion of the 
workshops and seminars asidng teacher participants about the utility of the activities in the 
workshop as translated into their teaching, e.g. had they used any of the sites visited in their 
own classroom field trips. 

(3) A third instrument the Our Qass and Its Work fOCIWl was administered to all participants 
and to two separate control groups. 

This instrument has the advantage of gathering student perceptions of teacher's classroom 
behavior that teaching research has shown to effect student achievement(Waxman & Hash, 1983) 
The OCIW (Hash & Waxma''-. 1983), a learning environment measure, is composed of eight scales 
of forty items which describe teaching behaviors . A nintii scale of ten items was added to gather 
data on students' reactions and attitudes toward science. Students respond to items on a four point 
scale of strongly agree to strongly disagree. The scales and samples items are presented in Table I. 



TABLE I 
SCALES AND DESCRIPTION OF OCIW 
[Reliabilities in ( )s reported in Chronback Alphas] 

Didactic Instmctionf ,87) 

Implies that the teacher controls and directs the instruction for all students in the class. 
Sample item:"Our teacher lets us do things our own way*" 
Enthusiasm (.92) 

Considers the extent to which a student sees the teacher exhibit excitement and interest in teaching. 
Sample item: 'We try new and different things in our classroom." 

Describes the extent to which the teacher respond to students answers and provide students with 

feedback about their schoolwoik* 

Sampk itcm:"Our teacher carefully checks all our woric." 

Instructional Time f.86) 

Refers to the time students spend in learning. 

Sample item:"We are always working in our class." 



opportunity to Lgam (.92) 

Indicates how well the teacher provides opportunities for all students to learn or cover criterion 
material. 

Sample iitem:"Many students do not finish their woric." 
Pacing (.92) 

Deals with whether or not the classroom woric is at the appropriate level of difficulty for students in 
the class. 
Sample item: 

"Our teacher spends too much time going over worL" 
Structuring Comments (.9n 

Refers to whether the teacher provides overviews at the beginning and end of instructional 

sequences and whether students understand. 

Sample item:"Our teacher often reviews yesterdays work." 

Task Orientation (.84^ 

Indicates the extent to which the classroom is businesslike. 
Sample item:'We always have an assignment to work on." 
Attitude Towa rd Science (Jl) 

Indicates students attitudes toward importance of science in society, the scientific method in their 
lives, and science as a chosen career. 
Sample item:"Scientists improve our lives." 



In the absence of specific objectives detailing subject matter to be learned, and in the absence 
of achievement tests which would measure with specificity and accuracy the teachers* achievement 
of science taught in the workshop, a more indirect measure of students* perception was used as the 
principal outcome measure. However this measure had been validated on criteria linking teacher 
behavior to increased or decreased student achievement. This indirect measure of impact was 
related to achievement outconies and did not depart finom the original more open-end objectives of 
the project. The results of comparisons of the NSF Workshop and comparison groups are carried 
in Table IL 
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TABLE n 

COMPARISONS OF IMPACT OF WORKSHOPS AND SEMINARS 
ON 1988 PARTICIPANTS' 
CLASSROOM TEACHING BEHAVIORS AND STUDENT ACHIEVEMENT 

(OCIW) 



GROUP COMPARISONS 

1. PRE and POST TEST: 
NSF Participants' Qasses 
Biology 
Chemistry 

Combined Biology & Chemistry 



N 



136 
306 
442 



17.14 
6.93 
12.34 



Level of 
Si gnificance 



.01 
.01 
.01 



1988 NSF PARTICIPANTS' CLASSES 
COMPARED TO NON NSF CLASSES: 
Combined Biology & Chemistry 1988 

Combined Non NSF classes 



442 
384 



21.56 



.01 



1988 NSF PARTICIPANTS (Post Test May 1989) 
COf/lPAREDTO 

1989 NSF PARTICIPANTS (Pre Test May 1989): 
Combined Biology & Chemistry 1988 

Combined Biology & Chemistry 1989 



442 
630 



19.31 



.01 



EXPERIENCED TEACHERS 1988 
COMPARED TO 
INEXPERIENCED TEACHERS 



442 



14.85 



.01 



5. BOYS COMPARED TO GIRLS, 
1988 VS WORKSHOP CLASSES 



442 



2.12 



NS 



ERIC 



P 



% 



2. Is the outcome the speciflc results of the treatment and if so is it specific to 
components of the treatment? 

The traditional issues of assessing outcome effects in an evaluation design are embedded in 
this outcome problem.(Lipsey, 1988) To control for the effects of other variables, three types of 
OCTW data on experimental and control groups gathered: 

(1) pre and post measures were gathered on the first year participants' classes; 

(2) data on a similar science class of a non-participating teacher in the same school were gathered; 

(3) data on a science class of a non-participating teacher at the end of the year were gathei-ed 
Quite large sample si7.es were obtained in each comparison (Ranging fix)m 136 to 630). Due 

to liiC large sample sizes on the controls and experimental populations, the workshop treatment 
would become the major measurable difference between the two groups, thus lending credence to 
the belief that these differences were the results of treatment and not variables extraneous to the 
study. 

Components of the treatment and their success in achieving specific measurable outcomes 
were more difficult to ascribe. In the measure on utility of workshop experiences in classroom 
enhancement, teachers were asked which v;orkshop experiences they were able to incoiporate 
direcdy into their classrooms - as one measure of outcomes being a specific result of treatment. Of 
the fourteen activities in die workshop treatment that were checked for direct classroom outcomes, 
nine were cross validated in the OCTW, (see Table EI) 
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TABLE m 

UTILITY OF WORKSHOP EXPERIENCES 
ACTIVITIES IN CLASSROOM ENHANCEMENT 
(Participants Self-Report) 



Activity or Experience P ercentage 

Have taken a field trip(s) that replicated those 42 
in NSF Workshop 

Have used new equipment in my teaching 64 
this year. 

Have used a different method to teach a concept 75 
or major learning* 

Have introduced new materials into regular • ' 93 

curriculum * 

Have had students plan and organize cooperative 43 
group projects * 

Have changed my approach to incorporate more 21 

student planning in new curriculum 
Have brought more resource people from 29 

industry into my classroom 
Have shared my project experience and 93 

material witfi oth^ teachers 
Have included in classroom work direct applications 82 

of science concepts in industry * 
Have observed increased student interest in science 54 

as a career this year* 
Have increased use of questioning in my class 57 

this year * 

Have found students with a wider range of academic 50 

abilities more interested in science this year * 
Have focused specific lessons on the importance of 75 

science in society and specific community * 
Have expanded the science content taught into 82 

new areas this year * 



* Item verified by student observation data ftom OCIW 



The activities were not specifically delineated in the original objectives and were an outgrowth 
of treatment as the workshop progressed This reflected a problem in outcome evaluation where 
outcomes arc not always known in advance, an issue on which more will be said later. 
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3. Can the outcomes be assessed directly? 

Our linear models of evaluation and the concern for direct unequivocal evidence leads users of 
evaluation to long for direct measures, i.e., in student achievement primarily test evidence. We 
could not measure with disciplinary tests the dktci effecte of the workshop's treatment so we 
settled for use of a surrogate variable, teacher behavior. However, we did use teacher behavior in 
the context of promotion of student learning. That is, we used an instrument that had been 
validated as detecting those teacher behaviors that increased student achievement 

Tlie selections of surrogate variables is important but it relates to a more significant evaluation 
consideration, the evolution of objectives as th'^ project progresses. In short, we get smarter as we 
ggin experience in a project For example, we over-estimated the effect that teacher prepared 
materials would have on reducing the amount of didactic instmction and under-estimated it on 
producing gains in the other 7 scales. In addition, the project directors did not anticipate that the 
treatment would have much the same effect on boys and girls and reduce the gender differences 
that have generally prevailed in other studies of gender differences in science achievement and 
attiitudes toward sdencc, 

V/hat gives us most confidence in our indirect measure of outcomes, through student 
perceptions, is the consistency of the three sources of data. Student perceptions paralleled teacher 
self report data whether it was descriptive or an opinion. When teachers reported the changing of 
classroom approach the OCIW student data substantiated this response, 

4. Can the outcome results be specified (variable relationships) and generalized 
to other groups? 

This outcome questicxi brings us back to an earlier concern on linking the cause and effect on 
a linear model. Our treatment has been documented in videos and the text activities can be 
lepKcatcd'by'othersrln'theseriespectsitis'both'Spccific-an^ As defined in the 
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original objectives it is not .Our learnings of what was effective canie as the workshop faculty 
interacted with participants and received f«:dbac*c firom the context of conducting the planned 
activities. For example there was a need to change some of the sices* In addition in the second 
summer workshops the site visits were restmctured and became more focused and in-depth than in 
the first summer. 

The evolutionary nature of instructional treatments are broadly based means that evaluative 
measm^s for outcome purposes if too narrowly conceived, especially where objectives evolve as 
. the project proceeds, will fail to detect important outcomes. Thus the call for ?,chievem:;nt gain 
scores based on standardized tests may very well provide outcomes which report impoverished 
results when what is involved is impoverished measurement. 

This problem of outcome evaluation was brought home very sharply to me a number of years 
ago when I was called in to evaluate a Tide HI, teacher development project While doing my data 
gathering I was asked specifically to look in on a project that focusol on support for beginning 
teachers in several high schools. The project released a very strong creative Biology teacher who 
worked with a group of 13 new teachers. An outgrowth of a concern for the high attrition of new 
teachers in this district of large suburban high schools, the project had been underway for 6 
months when I came on the scene. As in many cases when an evaluator is asked to do an add-on 
once into the assessment, uiis one had a special interest feature to it The Tide HI project had an 
internal evaluator who was charged with gathering data on the effectiveness of the 15 projects, 
ostensibly to build a plausible case for renewal of funding. What I did not know when asked to 
look closely at 'he new teacher support project was that the internal evaluator and project dkector 
had come to fisticuffs over evaluation and the internal evaluator had been closed out of the scene. 
When I met with the project director, who proved to be an engaging chap with much teaching 
moxie, he gave me a quick explanation of the conflict "He (the evaluator) kept asking me to tell 
him what outcomes I wanted - stated as objectives - so he could develop an evaluation design." 
When I stated that my outcome was to retain new teachers in the system, he said, "that's too vague 
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an objective for evaluation and I had to get more specific." The project director responded that he 
was busy with planning his program and having daily meetings with the teachers. The internal 
evaluator said I had to have specific objectives for my activities so he could evaluate them, or he 
would have my funding cut-off. Already tense from venturing into new ground, this was too 
much for the project director-teacher to take and he proceeded to reenact an academic version of the 
golden gloves with th& internal evaluator 

Om' difficulties in assessing the outcomes of this NSF workshop were addressed by Jonathan 
. Z. Shapiro in his writing. Early on in his career he pointed out the need to attend to both 
methodological and social concerns in an evaluation.(Shapm) 1984) An evaluation Shapiro 
claimed was unique not in purpose but in environment(Shapm) 1985) Hence the evaluator does 
not generate achievement "test scores" as automatic outcome measures. The unquestioning use of 
such standardized measures has certainly not raised the confidence level of decision makers in the 
sophistication of evaluators nor added to the utility of their findings. 

Respecting the dynamic nature of the environment influencing the treatment as it did, negated 
the belief that an evaluator could draw up a rigorous design sufficiently explanatory direcdy fcom 
the original objectives of a proposal. Neither project designer or evaluator were this farsighted at 
the origin of the proposals If one could accept the ambiguity of incompleteness and engage without 
reservation in the search for partial knowledge, then evaluator, decision makers, and project 
subjects could be employed in the evaluation design. From the conception of dynamic 
environmental dependence of evaluation and the evolving of objectives as a project progressed 
Shapiro became committed to participant evaluation - where participants became actively involved 
in the evaluation particularly seeking unanticipated outcomes in the data gathering.(Shapux) 1987) 
In addition he was able to demonstrate how involvement could be used without sacrificing 
methodology rigor and in fact would strengthen the evaluation and enhance the breadth and use of 
theunding.(Shapiro 1984a) 

What was central to any evaluation Shapiro, held is a commitment to a concept of social 
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justice. Thus in conducting an evaluation one is required to search for issues of larger meamng 
tiian simply tiie crsy to measure - forestalling tiie rush to tiie standardized achievement test 
(Shapiro 1986) Li tfus belief our evaluation of the NSF Workshop gives attention to students 
thoughts and opinions about science and society, science and citizenship, and science and a 
constructive career. As our teachers read these findings and see their effect, "impact", on stud^jnt 
outcomes they will understand more thoroughly than ever the moral source of teaching. By moral 
I mean, the shaping and directing of individuals lives which, without the unique experience in 
. sjience, might never have ventured into these domains of knowledge and thought, or more 
extremely, chosen to commit one's life-woik to science. 

Shapiro in his life and work saw outcome evaluation in the larger sphere of human endeavor. 
Evaluation of outcomes should be conducted to assist projects to contribute to broader choice 
(essentially to greater freedom), and, therefore, a more just society, and a more humane spirit 
That is indeed a fitting legacy for any professional career and one all evaluators could aspire to. 
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